Data processing

ABSTRACT

A data processor comprises a plurality of interconnected real processing units arranged to emulate the operation of an emulated processor having a plurality of interconnected emulated processing units. At least one emulated processing unit is emulated by contributions from two or more real processing units; and at least one real processing unit contributes to emulating two or more emulated processing units.

This invention relates to data processing.

As an example of data processing, electronic games are well known andmay be supplied on a variety of distribution media, such as magneticand/or optical discs. General computers or dedicated games consoles maybe used to play these games.

There is sometimes a need to emulate the operation of one processor onanother processor. That is to say, the emulating processor runs nativeprogram code arranged so that such native instructions or groups ofnative instructions have the same effect as data processing instructionsrelating to the emulated system.

A situation in which this need arises is where a data processor has beenupgraded by the manufacturer to a new “generation”—for example, a newhardware architecture or instruction protocol, but the manufacturerstill wants software relating to the older generation device to behandled (so-called backwards compatibility). Often the only way ofachieving this is for the newer generation device to run emulationsoftware which in turn acts in response to instructions relating to theolder generation device. In this case, while it is of courseacknowledged that running an emulation is generally much moreprocessor-intensive than running native software, the general trend ofgenerational improvements in the performance of data processing hardwareis such that the increased processing overhead can usually be handled.

This invention provides a data processor comprising a plurality ofinterconnected real processing units arranged to emulate the operationof an emulated processor having a plurality of interconnected emulatedprocessing units, in which:

at least one emulated processing unit is emulated by contributions fromtwo or more real processing units; and

at least one real processing unit contributes to emulating two or moreemulated processing units.

The invention addresses a problem relevant to an emulating system whichuses a multi-processor architecture, particularly (though notexclusively) one in which communication between processors (in theemulating system) is relatively slow compared to the general speed ofoperation of the emulating system. The invention recognises that adivision of the emulation of an emulated processing unit between two (ormore) emulating processing units can reduce the message traffic neededto provide communication between the emulations of those emulatedprocessing units. Similarly, by grouping together (on a single emulatingprocessing unit) the emulation of multiple processing units whichnormally communicate heavily with one another, once again the messagetraffic needed to provide communication between the emulations of thoseemulated processing units can be greatly reduced. These measures, takentogether, can provide a faster and more efficient emulation.

Various further respective aspects and features of the invention aredefined in the appended claims.

Embodiments of the invention will now be described, by way of exampleonly, with reference to the accompanying drawings in which:

FIG. 1 schematically illustrates the overall system architecture of thePlayStation2;

FIG. 2 schematically illustrates the architecture of an Emotion Engine;

FIG. 3 schematically illustrates the configuration of a GraphicsSynthesiser;

FIG. 4 schematically illustrates the structure of an emulatingprocessor, in particular a Sony® PlayStation 3® device;

FIG. 5 schematically illustrates a cell processor;

FIG. 6 schematically illustrates a graphics unit; and

FIG. 7 schematically illustrates logical interactions within theemulating processor.

Referring now to the drawings, FIG. 1 schematically illustrates theoverall system architecture of the PlayStation2 computer games machine.A system unit 10 is provided, with various peripheral devicesconnectable to the system unit.

The system unit 10 comprises: an Emotion Engine 100; a GraphicsSynthesiser 200; a sound processor unit 300 having dynamic random accessmemory (DRAM); a read only memory (ROM) 400; a compact disc (CD) anddigital versatile disc (DVD) reader 450; a Rambus Dynamic Random AccessMemory (RDRAM) unit 500; an input/output processor (IOP) 700 withdedicated RAM 750. An (optional) external hard disk drive (MD) 390 maybe connected.

The input/output processor 700 has two Universal Serial Bus (USB) ports715 and an iLink or IEEE 1394 port (iLink is the Sony Corporationimplementation of the IEEE 1394 standard) (not shown). The IOP 700handles all USB, iLink and game controller data traffic. For examplewhen a user is playing a game, the IOP 700 receives data from the gamecontroller and directs it to the Emotion Engine 100 which updates thecurrent state of the game accordingly. The IOP 700 has a Direct MemoryAccess (DMA) architecture to facilitate rapid data transfer rates. DMAinvolves transfer of data from main memory to a device without passingit through the CPU. The USB interface is compatible with Open HostController Interface (OHCI) and can handle data transfer rates ofbetween 1.5 Mbps and 12 Mbps. Provision of these interfaces means thatthe PlayStation2 is potentially compatible with peripheral devices suchas digital video cassette recorders (VCRs) e.g. camcorders, digitalcameras, microphones, printers, and input devices such as a keyboard,mouse and joystick.

Generally, in order for successful data communication to occur with aperipheral device connected to a USB port 715, an appropriate piece ofsoftware such as a device driver should be provided. Device drivertechnology is very well known and will not be described in detail here,except to say that the skilled man will be aware that a device driver orsimilar software interface may be required in the embodiment describedhere.

In the present embodiment, a USB microphone 730 is connected to the USBport. It will be appreciated that the USB microphone 730 may be ahand-held microphone or may form part of a head-set that is worn by thehuman operator. The advantage of wearing a head-set is that the humanoperator's hands are free to perform other actions. The microphoneincludes an analogue-to-digital converter (ADC) and a basichardware-based real-time data compression and encoding arrangement, sothat audio data are transmitted by the microphone 730 to the USB port715 in an appropriate format, such as a streaming compressed audioformat for decoding at the PlayStation 2 system unit 10.

Apart from the USB ports, two other ports 705, 710 are proprietarysockets allowing the connection of a proprietary non-volatile RAM memorycard 720 for storing game-related information, a hand-held gamecontroller 725 or a device (not shown) mimicking a hand-held controller,such as a dance mat.

The system unit 10 may be connected to a network adapter 805 thatprovides an interface (such as an Ethernet interface) to a network. Thisnetwork may be, for example, a LAN, a WAN or the Internet. The networkmay be a general network or one that is dedicated to game relatedcommunication. The network adapter 805 allows data to be transmitted toand received from other system units 10 that are connected to the samenetwork, (the other system units 10 also having corresponding networkadapters 805).

The Emotion Engine 100 is a 128-bit Central Processing Unit (CPU) thathas been specifically designed for efficient simulation of 3 dimensional(3D) graphics for games applications. The Emotion Engine componentsinclude a data bus, cache memory (part of its CPU core) and registers,all of which are 128-bit. This facilitates fast processing of largevolumes of multi-media data. Conventional PCs, by way of comparison,have a basic 64-bit data structure. The floating point calculationperformance of the PlayStation2 is 6.2 GFLOPs. The Emotion Engine alsocomprises MPEG2 decoder circuitry which allows for simultaneousprocessing of 3D graphics data and DVD data The Emotion Engine performsgeometrical calculations including mathematical transforms andtranslations and also performs calculations associated with the physicsof simulation objects, for example, calculation of friction between twoobjects. It produces sequences of image rendering commands which aresubsequently utilised by the Graphics Synthesiser 200. The imagerendering commands are output in the form of display lists. A displaylist is a sequence of drawing commands that specifies to the GraphicsSynthesiser which primitive graphic objects (e.g. points, lines,triangles, sprites) to draw on the screen and at which co-ordinates.Thus a typical display list will comprise commands to draw vertices,commands to shade the faces of polygons, render bitmaps and so on. TheEmotion Engine 100 can asynchronously generate multiple display lists.

The Graphics Synthesiser 200 is a video accelerator that performsrendering of the display lists produced by the Emotion Engine 100. TheGraphics Synthesiser 200 includes a graphics interface unit (GIF) whichhandles, tracks and manages the multiple display lists. The renderingfunction of the Graphics Synthesiser 200 can generate image data thatsupports several alternative standard output image formats, i.e.,NTSC/PAL, High Definition TV and VESA. In general, the renderingcapability of graphics systems is defined by the memory bandwidthbetween a pixel engine and a video memory, each of which is locatedwithin the graphics processor. Conventional graphics systems useexternal Video Random Access Memory (VRAM) connected to the pixel logicvia an off-chip bus which tends to restrict available bandwidth.However, the Graphics Synthesiser 200 of the PlayStation2 provides thepixel logic and the video memory on a single high-performance chip whichallows for a comparatively large 38.4 Gigabyte per second memory accessbandwidth. The Graphics Synthesiser is theoretically capable ofachieving a peak drawing capacity of 75 million polygons per second.Even with a full range of effects such as textures, lighting andtransparency, a sustained rate of 20 million polygons per second can bedrawn continuously. Accordingly, the Graphics Synthesiser 200 is capableof rendering a film-quality image.

The Sound Processor Unit (SPU) 300 is effectively the soundcard of thesystem which is capable of handling 3D digital sound such as DigitalTheater Surround (DTS®) sound and AC-3 (also known as Dolby Digital)which is the sound format used for DVDs.

A display and sound output device 305, such as a video monitor ortelevision set with an associated loudspeaker arrangement 310, isconnected to receive video and audio signals from the graphicssynthesiser 200 and the sound processing unit 300.

The main memory supporting the Emotion Engine 100 is the RDRAM (RambusDynamic Random Access Memory) module 500 licensed by RambusIncorporated. This RDRAM memory subsystem comprises RAM, a RAMcontroller and a bus connecting the RAM to the Emotion Engine 100.

FIG. 2 schematically illustrates the architecture of the Emotion Engine100 of FIG. 1. The Emotion Engine 100 is a collective term for a numberof processing units interconnected to give a desired set offunctionality. Viewed in this context, the Emotion Engine comprises: afloating point unit (FPU) 104; a central processing unit (CPU) core 102;vector unit zero (VU0) 106; vector unit one (VU1) 108; a graphicsinterface unit (GIF) 110; an interrupt controller (INTC) 112; a timerunit 114; a direct memory access controller 116; an image data processorunit (IPU) 118; a dynamic random access memory controller (DRAMC) 120; asub-bus interface (SIF) 122; and all of these individual processingunits are connected via a 128-bit main bus 124.

The CPU core 102 is a 128-bit processor clocked at 300 MHz (in fact294.912 MHz, but 300 MHz tends to be used as shorthand for this figure).The CPU core has access to 32 MB of main memory via the DRAMC 120. TheCPU core 102 instruction set is based on MIPS III RISC with some MIPS IVRISC instructions together with additional multimedia instructions. MIPSIII and IV are Reduced Instruction Set Computer (RISC) instruction setarchitectures proprietary to MIPS Technologies, Inc. Standardinstructions are 64-bit, two-way superscalar, which means that twoinstructions can be executed simultaneously. Multimedia instructions, onthe other hand, use 128-bit instructions via two pipelines. The CPU core102 comprises a 16 KB instruction cache, an 8 KB data cache and a 16 KBscratchpad RAM which is a portion of cache connected by a dedicated busto the CPU, allowing data access independent of the main bus.

The FPU 104 serves as a first co-processor for the CPU core 102. Thevector unit 106 acts as a second co-processor. The FPU 104 comprises afloating point division calculator (FDIV). The vector units 106 and 108perform mathematical operations and are essentially specialised FPUsthat are extremely fast at evaluating the multiplication and addition ofvector equations. They use Floating-Point Multiply-Adder Calculators(FMACs) for addition and multiplication operations and Floating-PointDividers (FDIVs) for division and square root operations. The FMACsoperate on 32-bit values so when an operation is carried out on a128-bit value (composed of four 32-bit values) an operation can becarried out on all four parts concurrently. For example adding 2 vectorstogether can be done at the same time. The VUs have built-in memory forstoring micro-programs and interface with the rest of the system viaVector Interface Units (VIFs) referred to by the same number as thecorresponding Vector Unit. Vector unit zero 106 can work as acoprocessor to the CPU core 102 via a dedicated 128-bit bus so it isessentially a second specialised FPU. Vector unit one 108, on the otherhand, has a dedicated bus to the Graphics synthesiser 200 and thus canbe considered as a completely separate processor. The inclusion of twovector units allows the software developer to split up the work betweendifferent parts of the CPU and the vector units can be used in eitherserial or parallel connection.

Vector unit zero 106 comprises 4 FMACS and 1 FDIV. It is connected tothe CPU core 102 via a coprocessor connection. It has 4 KB of vectorunit memory for data and 4 KB of micro-memory for instructions. Vectorunit zero 106 is useful for performing physics calculations associatedwith the images for display. It primarily executes non-patternedgeometric processing together with the CPU core 102.

Vector unit one 108 comprises 5 FMACS and 2 FDIVs. It has no direct pathto the CPU core 102, although it does have a direct path to the GIF unit110. It has 16 KB of vector unit memory for data and 16 KB ofmicro-memory for instructions. Vector unit one 108 is useful forperforming transformations. It primarily executes patterned geometricprocessing and directly outputs a generated display list to the GIF 110.

The GIF 110 is an interface unit to the Graphics Synthesiser 200. Itconverts data according to a tag specification at the beginning of adisplay list packet and transfers drawing commands to the GraphicsSynthesiser 200 whilst mutually arbitrating multiple transfer. Theinterrupt controller (INTC) 112 serves to arbitrate interrupts fromperipheral devices, except the DMAC 116.

The timer unit 114 comprises four independent timers with 16-bitcounters. The timers are driven either by the bus clock (at 1/16 or1/256 intervals) or via an external clock. The DMAC 116 handles datatransfers between main memory and scratchpad RAM, or between main memoryor scratchpad RAM and peripherals. It arbitrates the main bus 124 at thesame time. Performance optimisation of the DMAC 116 is a key way bywhich to improve Emotion Engine performance. The image processing unit(IPU) 118 is an image data processor that is used to expand compressedanimations and texture images. It performs macro-Block decoding, colourspace conversion and vector quantisation. Finally, the sub-bus interface(SIF) 122 is an interface unit to the IOP 700. The IPU has its ownmemory and bus to control 10 devices such as sound chips and storagedevices.

FIG. 3 schematically illustrates the configuration of the GraphicSynthesiser 200. The Graphics Synthesiser comprises: a host interface202; a pixel pipeline 206; a memory interface 208; a local memory 212including a frame page buffer 214 and a texture page buffer 216; and avideo converter 210.

The host interface 202 transfers data with the host (in this case theGIF 110). Both drawing data and buffer data from the host pass throughthis interface. The output from the host interface 202 is supplied tothe graphics synthesiser 200 which develops the graphics to draw pixelsbased on vertex information received from the Emotion Engine 100, andcalculates information such as RGBA value, depth value (i.e. Z-value),texture value and fog value for each pixel. The RGBA value specifies thered, green, blue (RGB) colour components and the A (Alpha) componentrepresents opacity of an image object. The Alpha value can range fromcompletely transparent to totally opaque. The pixel data is supplied tothe pixel pipeline 206 which performs processes such as texture mapping,fogging and Alpha-blending and determines the final drawing colour basedon the calculated pixel information.

The pixel pipeline 206 comprises 16 pixel engines PE1, PE2, . . . , PE16so that it can process a maximum of 16 pixels concurrently. The pixelpipeline 206 runs at 150 MHz (in fact 294.912/2 MHz) with 32-bit colourand a 32-bit Z-buffer. The memory interface 208 reads data from andwrites data to the local Graphics Synthesiser memory 212. It writes thedrawing pixel values (RGBA and Z) to memory at the end of a pixeloperation and reads the pixel values of the frame buffer 214 frommemory. These pixel values read from the frame buffer 214 are used forpixel test or Alpha-blending. The memory interface 208 also reads fromlocal memory 212 the RGBA values for the current contents of the framebuffer. The local memory 212 is a 32 Mbit (4 MB) memory that is built-into the Graphics Synthesiser 200. It can be organised as a frame buffer214, texture buffer 216 and a Z-buffer 215. The frame buffer 214 is theportion of video memory where pixel data such as colour information isstored.

The Graphics Synthesiser uses a 2D to 3D texture mapping process to addvisual detail to 3D geometry. Each texture may be wrapped around a 3Dimage object and is stretched and skewed to give a 3D graphical effect.The texture buffer is used to store the texture information for imageobjects. The Z-buffer 215 (also known as depth buffer) is the memoryavailable to store the depth information for a pixel. Images areconstructed from basic building blocks known as graphics primitives orpolygons. When a polygon is rendered with Z-buffering, the depth valueof each of its pixels is compared with the corresponding value stored inthe Z-buffer. If the value stored in the Z-buffer is greater than orequal to the depth of the new pixel value then this pixel is determinedvisible so that it should be rendered and the Z-buffer will be updatedwith the new pixel depth. If however the Z-buffer depth value is lessthan the new pixel depth value the new pixel value is behind what hasalready been drawn and will not be rendered. Alternative Z-buffer testsare available, so that (a) the new pixel always replaces the previousvalue, or (b) the new pixel replaces the previous pixel value if itsdepth is greater than or equal to the previous value stored in the Zbuffer.

The local memory 212 has a 1024-bit read port and a 1024-bit write portfor accessing the frame buffer and Z-buffer and a 512-bit port fortexture reading. The video converter 210 is operable to display thecontents of the frame memory in a specified output format.

An arrangement will now be described to allow the emulation of thesystem described with reference to FIGS. 1 to 3. Note that forconvenience, the processing units shown in FIGS. 1 to 3 will be referredto as “emulated” processing units, whereas in the emulating system to bedescribed below, processing units of that (emulating)-system will bereferred to as “emulating” processing units. To avoid any-possibleconfusion, note that both categories of processing units (“emulated” and“emulating” processing units) represent physical processing unitscapable of running native software appropriate to those processingunits.

FIG. 4 schematically illustrates the overall system architecture of theSony® Playstation 3® entertainment device. A system unit 910 isprovided, with various peripheral devices connectable to the systemunit.

The system unit 910 comprises: a Cell processor 1100; a Rambus® dynamicrandom access memory (XDRAM) unit 1500; a Reality Synthesiser graphicsunit 1200 with a dedicated video random access memory (VRAM) unit 1250;and an I/O bridge 1700.

The system unit 910 also comprises a Blu Ray® Disk BD-ROM® optical diskreader 1430 for reading a disk 1440 and a removable slot-in hard diskdrive (HDD) 1400, accessible through the I/O bridge 1700. Optionally thesystem unit also comprises a memory card reader 1450 for reading compactflash memory cards, Memory Stick® memory cards and the like, which issimilarly accessible through the I/O bridge 1700.

The I/O bridge 1700 also connects to six Universal Serial Bus (USB) 2.0ports 1710; a gigabit Ethernet port 1720; an IEEE 802.11b/g wirelessnetwork (Wi-Fi) port 1730; and a Bluetooth® wireless link port 1740capable of supporting of up to seven Bluetooth connections.

In operation the I/O bridge 1700 handles all wireless, USB and Ethernetdata, including data from one or more game controllers 1751. For examplewhen a user is playing a game, the I/O bridge 1700 receives data fromthe game controller 1751 via a Bluetooth link and directs it to the Cellprocessor 1100, which updates the current state of the game accordingly.

The wireless, USB and Ethernet ports also provide connectivity for otherperipheral devices in addition to game controllers 1751, such as: aremote control 1752; a keyboard 1753; a mouse 1754; a portableentertainment device 1755 such as a Sony Playstation Portable®entertainment device; a video camera such as an EyeToy® video camera1756; and a microphone headset 1757. Such peripheral devices maytherefore in principle be connected to the system unit 910 wirelessly;for example the portable entertainment device 1755 may communicate via aWi-Fi ad-hoc connection, whilst the microphone headset 1757 maycommunicate via a Bluetooth link.

The provision of these interfaces means that the Playstation 3 device isalso potentially compatible with other peripheral devices such asdigital video recorders (DVRs), set-top boxes, digital cameras, portablemedia players, Voice over IP telephones, mobile telephones, printers andscanners.

In addition, a legacy memory card reader 1410 may be connected to thesystem unit via a USB port 1710, enabling the reading of memory cards1420 of the kind used by the Playstation® or Playstation 2® devices.

In the present embodiment, the game controller 1751 is operable tocommunicate wirelessly with the system unit 910 via the Bluetooth link.However, the game controller 1751 can instead be connected to a USBport, thereby also providing power by which to charge the battery of thegame controller 1751. In addition to one or more analogue joysticks andconventional control buttons, the game controller is sensitive to motionin 6 degrees of freedom, corresponding to translation and rotation ineach axis. Consequently gestures and movements by the user of the gamecontroller may be translated as inputs to a game in addition to orinstead of conventional button or joystick commands. Optionally, otherwirelessly enabled peripheral devices such as the Playstation Portabledevice may be used as a controller. In the case of the PlaystationPortable device, additional game or control information (for example,control instructions or number of lives) may be provided on the screenof the device. Other alternative or supplementary control devices mayalso be used, such as a dance mat (not shown), a light gun (not shown),a steering wheel and pedals (not shown) or bespoke controllers, such asa single or several large buttons for a rapid-response quiz game (alsonot shown).

The remote control 1752 is also operable to communicate wirelessly withthe system unit 910 via a Bluetooth link. The remote control 1752comprises controls suitable for the operation of the Blu Ray Disk BD-ROMreader 1430 and for the navigation of disk content.

The Blu Ray Disk BD-ROM reader 1430 is operable to read CD-ROMscompatible with the Playstation and PlayStation 2 devices, in additionto conventional pre-recorded and recordable CDs, and so-called SuperAudio CDs. The reader 1430 is also operable to read DVD-ROMs compatiblewith the Playstation 2 and PlayStation 3 devices, in addition toconventional pre-recorded and recordable DVDs. The reader 1430 isfurther operable to read BD-ROMs compatible with the Playstation 3device, as well as conventional pre-recorded and recordable Blu-RayDisks.

The system unit 910 is operable to supply audio and video, eithergenerated or decoded by the Playstation 3 device via the RealitySynthesiser graphics unit 1200, through audio and video connectors to adisplay and sound output device 1300 such as a monitor or televisionset, having a display screen 1305 and one or more loudspeakers 1310. Theaudio connectors 1210 may include conventional analogue and digitaloutputs whilst the video connectors 1220 may variously include componentvideo, S-video, composite video and one or more High DefinitionMultimedia Interface (HDMI) outputs. Consequently, video output may bein formats such as PAL or NTSC, or in 720p, 1080i or 1080p highdefinition.

Audio processing (generation, decoding and so on) is performed by theCell processor 1100. The Playstation 3 device's operating systemsupports Dolby® 5.1 surround sound, Dolby® Theatre Surround (DTS), andthe decoding of 7.1 surround sound from Blu-Ray® disks.

In the present embodiment, the video camera 1756 comprises a singlecharge coupled device (CCD), an LED indicator, and hardware-basedreal-time data compression and encoding apparatus so that compressedvideo data may be transmitted in an appropriate format such as anintra-image based MPEG (motion picture expert group) standard fordecoding by the system unit 910. The camera LED indicator is arranged toilluminate in response to appropriate control data from the system unit910, for example to signify adverse lighting conditions. Embodiments ofthe video camera 1756 may variously connect to the system unit 910 via aUSB, Bluetooth or Wi-Fi communication port. Embodiments of the videocamera may include an associated microphone and also be capable oftransmitting audio data. In embodiments of the video camera, the CCD mayhave a resolution suitable for high-definition video capture. In use,images captured by the video camera may for example be incorporatedwithin a game or interpreted as game control inputs.

In general, in order for successful data communication to occur with aperipheral device such as a video camera or remote control via one ofthe communication ports of the system unit 910, an appropriate piece ofsoftware such as a device driver should be provided. Device drivertechnology is well-known and will not be described in detail here,except to say that the skilled man will be aware that a device driver orsimilar software interface may be required in the present embodimentdescribed.

Referring now to FIG. 5, the Cell processor 1100 has an architecturecomprising four basic components: external input and output structurescomprising a memory controller 1160 and a dual bus interface controller1170A,B; a main processor referred to as the Power Processing Element(PPE) 1150; eight co-processors referred to as Synergistic ProcessingElements (SPEs) 1110A-H; and a circular data bus connecting the abovecomponents referred to as the Element Interconnect Bus 1180. The totalfloating point performance of the Cell processor is 218 GFLOPS, comparedwith the 6.2 GFLOPs of the Playstation 2 device's Emotion Engine.

The Power Processing Element (PPE) 1150 is based upon a two-waysimultaneous multithreading Power 970 compliant PowerPC core (PP) 1155running with an internal clock of 3.2 GHz. It comprises a 512 kB level 2(L2) cache and a 32 kB level 1 (L1) cache. The PPE 1150 is capable ofeight single position operations per clock cycle, translating to 25.6GFLOPs at 3.2 GHz. The primary role of the PPE 1150 is to act as acontroller for the Synergistic Processing Elements 111A-H, which handlemost of the computational workload. In operation the PPE 1150 maintainsa job queue, scheduling jobs for the Synergistic Processing Elements1110A-H and monitoring their progress. Consequently each SynergisticProcessing Element 110A-H runs a kernel whose role is to fetch a job,execute it and synchronise with the PPE 1150.

Each Synergistic Processing Element (SPE) 1110A-H comprises a respectiveSynergistic Processing Unit (SPU′—to distinguish it from the SoundProcessing Unit mentioned above) 1120A-H, and a respective Memory FlowController (MFC) 1140A-H comprising in turn a respective Dynamic MemoryAccess Controller (DMAC) 1142A-H, a respective Memory Management Unit(MMU) 1144A-H and a bus interface (not shown). Each SPU′ 1120A-H is aRISC processor clocked at 3.2 GHz and comprising 256 kB local RAM1130A-H, expandable in principle to 4 GB. Each SPE gives a theoretical25.6 GFLOPS of single precision performance. An SPU′ can operate on 4single precision floating point members, 4 32-bit numbers, 8 16-bitintegers, or 16 8-bit integers in a single clock cycle. In the sameclock cycle it can also perform a memory operation. The SPU′ 1120A-Hdoes not directly access the system memory XDRAM 1500; the 64-bitaddresses formed by the SPU′ 1120A-H are passed to the MFC 1140A-H whichinstructs its DMA controller 1142A-H to access memory via the ElementInterconnect Bus 1180 and the memory controller 1160.

The Element Interconnect Bus (FIB) 1180 is a logically circularcommunication bus internal to the Cell processor 1100 which connects theabove processor elements, namely the PPE 1150, the memory controller1160, the dual bus interface 1170A,B and the 8 SPEs 1110A-H, totalling12 participants. Participants can simultaneously read and write to thebus at a rate of 8 bytes per, clock cycle. As noted previously, each SPE1110A-H comprises a DMAC 1142A-H for scheduling longer read or writesequences. The EIB comprises four channels, two each in clockwise andanti-clockwise directions. Consequently for twelve participants, thelongest step-wise data-flow between any two participants is six steps inthe appropriate direction. The theoretical peak instantaneous EIBbandwidth for 12 slots is therefore 96 B (bytes) per clock, in the eventof full utilisation through arbitration between participants. Thisequates to a theoretical peak bandwidth of 307.2 GB/s (gigabytes persecond) at a clock rate of 3.2 GHz.

The memory controller 1160 comprises an XDRAM interface 1162, developedby Rambus Incorporated. The memory controller interfaces with the RambusXDRAM 1500 with a theoretical peak bandwidth of 25.6 GB/s.

The dual bus interface 1170A,B comprises a Rambus FlexIO® systeminterface 1172A,B. The interface is organised into 12 channels eachbeing 8 bits wide, with five paths being inbound and seven outbound.This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/soutbound, 26 GB/s inbound) between the Cell processor and the I/O Bridge1700 via the controller 1170A and the Reality Simulator graphics unit1200 via controller 1170B.

Data sent by the Cell processor 1100 to the Reality Simulator graphicsunit 1200 will typically comprise display lists, being a sequence ofcommands to draw vertices, apply textures to polygons, specify lightingconditions, and so on.

Referring now to FIG. 6, the Reality Simulator graphics (RSX) unit 1200is a video accelerator based upon the NVidia® G70/71 architecture thatprocesses and renders lists of commands produced by the Cell processor1100. The RSX unit 1200 comprises a host interface 1202 operable tocommunicate with the bus interface controller 1170B of the Cellprocessor 1100; a vertex pipeline 1204 (VP) comprising eight vertexshaders 1205; a pixel pipeline 1206 (PP) comprising 24 pixel shaders1207; a render pipeline 1208 (RP) comprising eight render output units(ROPs) 1209; a memory interface 1210; and a video converter 1212 forgenerating a video output. The RSX 1200 is complemented by 256 MB doubledata rate (DDR) video RAM (VRAM) 1250, clocked at 600 MHz and operableto interface with the RSX 1200 at a theoretical peak bandwidth of 25.6GB/s. In operation, the VRAM 1250 maintains a frame buffer 1214 and atexture buffer 1216. The texture buffer 1216 provides textures to thepixel shaders 1207, whilst the frame buffer 1214 stores results of theprocessing pipelines. The RSX can also access the main memory 1500 viathe EIB 1180, for example to load textures into the VRAM 1250.

The vertex pipeline 1204 primarily processes deformations andtransformations of vertices defining polygons within the image to berendered.

The pixel pipeline 1206 primarily processes the application of colour,textures and lighting to these polygons, including any pixeltransparency, generating red, green, blue and alpha (transparency)values for each processed pixel. Texture mapping may simply apply agraphic image to a surface, or may include bump-mapping (in which thenotional direction of a surface is perturbed in accordance with texturevalues to create-highlights and shade in the lighting model) ordisplacement mapping (in which the applied texture additionally perturbsvertex positions to generate a deformed surface consistent with thetexture).

The render pipeline 1208 performs depth comparisons between pixels todetermine which should be rendered in the final image. Optionally, ifthe intervening pixel process will not affect depth values (for examplein the absence of transparency or displacement mapping) then the renderpipeline and vertex pipeline 1204 can communicate depth informationbetween them, thereby enabling the removal of occluded elements prior topixel processing, and so improving overall rendering efficiency. Inaddition, the render pipeline 1208 also applies subsequent effects suchas full-screen anti-aliasing over the resulting image.

Both the vertex shaders 1205 and pixel shaders 1207 are based on theshader model 3.0 standard. Up to 136 shader operations can be performedper clock cycle, with the combined pipeline therefore capable of 74.8billion shader operations per second, outputting up to 840 millionvertices and 10 billion pixels per second. The total floating pointperformance of the RSX 1200 is 1.8 TFLOPS.

Typically, the RSX 1200 operates in close collaboration with the Cellprocessor 1100; for example, when displaying an explosion, or weathereffects such as rain or snow, a large number of particles must betracked, updated and rendered within the scene. In this case, the PPU1155 of the Cell processor may schedule one or more SPEs 1110A-H tocompute the trajectories of respective batches of particles. Meanwhile,the RSX 1200 accesses any texture data (e.g. snowflakes) not currentlyheld in the video RAM 1250 from the main system memory 1500 via theelement interconnect bus 1180, the memory controller 1160 and a businterface controller 1170B. The or each SPE 1110A-H outputs its computedparticle properties (typically coordinates and normals, indicatingposition and attitude) directly to the video RAM 1250; the DMAcontroller 1142A-H of the or each SPE 1110A-H addresses the video RAM1250 via the bus interface controller 1170B. Thus in effect the assignedSPEs become part of the video processing pipeline for the duration ofthe task.

In general, the PPU 1155 can assign tasks in this fashion to six of theeight SPEs available; one SPE is reserved for the operating system,whilst one SPE is optionally disabled. The disabling of one SPE providesa greater level of tolerance during fabrication of the Cell processor,as it allows for one SPE to fail the fabrication process. Alternativelyif all eight SPEs are functional, then the eighth SPE provides scope forredundancy in the event of subsequent failure by one of the other SPEsduring the life of the Cell processor.

The PPU 1155 can assign tasks to SPEs in several ways. For example, SPEsmay be chained together to handle each step in a complex operation, suchas accessing a DVD, video and audio decoding, and error masking, witheach step being assigned to a separate SPE. Alternatively or inaddition, two or more SPEs may be assigned to operate on input data inparallel, as in the particle animation example above.

Software instructions implemented by the Cell processor 1100 and/or theRSX 1200 may be supplied at manufacture and stored on the HDD 1400,and/or may be supplied on a data carrier or storage medium such as anoptical disk or solid state memory, or via a transmission medium such asa wired or wireless network or internet connection, or via combinationsof these.

The software supplied at manufacture comprises system firmware and thePlaystation 3 device's operating system (OS). In operation, the OSprovides a user interface enabling a user to select from a variety offunctions, including playing a game, listening to music, viewingphotographs, or viewing a video. The interface takes the form of aso-called cross media-bar (XMB), with categories of function arrangedhorizontally. The user navigates by moving through the functionshorizontally using a game controller 1751, remote control 1752 or othersuitable control device so as to highlight the desired function, atwhich point options pertaining to that function appear as a verticallyscrollable list centred on that function, which may be navigated inanalogous fashion. However, if a game, audio or movie disk 1440 isinserted into the BD-ROM optical disk reader 1430, the Playstation 3device may select appropriate options automatically (for example, bycommencing the game), or may provide relevant options (for example, toselect between playing an audio disk or compressing its content to theHDD 1400).

In addition, the OS provides an on-line capability, including a webbrowser, an interface with an on-line store from which additional gamecontent, demos and other media may be downloaded, and a friendsmanagement capability, providing on-line communication with otherPlaystation 3 device users nominated by the user of the current device;for example, by text, audio or video depending on the peripheral devicesavailable. The on-line capability also provides for on-linecommunication, content download and content purchase during play of asuitably configured game, and for updating the firmware and OS of thePlaystation 3 device itself.

The operation of the PS2 arrangement described with reference to FIGS. 1to 3 is reproduced (or very nearly reproduced) by software running onthe arrangement of FIGS. 4 to 6, despite the fact that the emulatingprocessing units of FIGS. 4 to 6 have tin general terms) a differentarchitecture, speed, memory accessing capabilities and so on, comparedto the emulated processing units of FIGS. 1 to 3.

The reproduction of the operation of the PS2 arrangement is an emulationrather than a simulation. That is to say, it is not the case that all ofthe operations contributing to the functionality of the PS2 arereproduced by the emulating system in a lock-step, clock-by-clockmanner. Rather, some functions may be carried out by time division on asingle emulating processing unit, and in general the processing unitscommunicate with one another only when there is a need (within theemulated system) to do so.

The PPE 1150 controls the overall operation of the emulating system andruns an operating system (OS) for the emulating system. It also has onethread which provides interpretation of native Emotion Engine PS2instructions into native SPE instructions which it supplies, with anyassociated information (such as allocation of emulation functions—seebelow) which is required to carry out the respective part of theemulation, to the relevant SPEs via the EIB, while another threadprovides the function of recompiling new code native to the emulatingsystem to provide the particular functionality defined by theinterpreted PS2 code. Emulation of the various parts of the PS2 systemdescribed above is devolved to the eight SPEs acting as emulatingprocessing units, which emulate PS2 functionality as set out below. Itwill be appreciated that the precise identity of the individual SPEs isjust a convenient notation and has no technical significance because ofthe nature of the message-passing interface between the SPEs. So, forexample, the operations assigned to the SPEs 1110A and 1110B could beswapped in their entirety with no technical effect on the overallemulation process. It will also be appreciated that the disabling of oneSPE can be carried out as mentioned above, so that the tasks are in factdivided amongst the remaining seven SPEs.

SPE 1110A IPU 118 SPE 1110B Emotion Engine CPU 102 and Vector Unit 0 SPE1110C VIF 0, VIF 1, GIF 110 SPE 1110D Vector Unit 1 SPE 1110E GS 200(i.e. that part of the operation of the GS 200 specific to the PS2system; the SPE 1110G also interfaces with a graphics controller of theemulating system (not shown) for non-PS2-specific graphics operations)SPE 1110F generally unused, but can be used to recompile code to emulateVector Unit 1, to ease the load on the one thread of the PPE 1150described above SPE 1110G SPU 300 SPE 1110H IOP 700 and SIF 122

The PS2 used a conventional bus for communication between the variousemulated processing units. The emulating system makes use of the EIB1180 for passing messages between SPEs and between an SPE, the PPE 1150and the I/O bridge 1700 (and/or other system devices such as the RSX1200). The PS2 system had conventional memory access arrangements toaccess the RDRAM 500. The emulating system uses a distributed DMA system(the DMA controllers 1142A-H). The main system memory 1500 is treated asa common memory “pool”, with all SPEs having access to it.

Each of the SPEs runs locally, on its own time clock The SPEs runsoftware to allow parts of the functionality of the PS2 system to beemulated. As between EPUs, synchronisation is required only when theemulated processing units emulated by the EPUs need to communicate withone another. At that time, synchronisation takes place just between thedevices concerned, using a message transfer mechanism via the EIB.

To achieve this, when synchronisation (of emulated functionality) isrequired between two SPEs, one of the SPEs places a message onto the EIB(including a source SPE identifier, a destination SPE identifier etc),addressed to the other of the SPEs. The message may include a requestfor a certain piece of data, or may include a data item which is beingsent to that other SPE involved in that particular synchronisation. Whenan acknowledgement is returned by that other SPE, the transaction iscomplete. This is a reliable but rather slow method of synchronising twoemulated processors.

The way in which the SPEs are logically arranged is shown schematicallyin FIG. 5. Example paths of logical communication between the SPEs arealso shown, although these need not be exhaustive. It can be seen thatsome functions are shared on the same SPE which avoids entirely the needto use the message-passing mechanism to communicate between them. So,providing the emulation of two (or more) emulated processing units on asingle SPE can improve the system's performance by reducing the amountof inter-SPE communication needed.

Another feature which is not exhaustively indicated on FIG. 5, forclarity of the diagram, is that where the emulated processing unitsemulated by two different SPEs need to communicate with one another alot in order to carry out particular functions of the PS2, a part of thefunctionality of one real processor can be carried out by the “other”processor's SPE.

For example, the PS2 sound processing unit is mostly emulated on oneSPE. This processes samples and mixes them into the final samples foroutput. It also processes accesses to its register map. However, theregisters used to write to sound processing unit sample memory areemulated on the IOP's SPE which manages the queuing of accesses,directly accesses the sample memory image in main memory, and raises anyinterrupts these might cause as though they had been routed from thesound processing unit.

Another example of a device in the PS2 system which is implemented onmore than one SPE is the DMAC, whose function is distributed between theemulating components primarily used for the Emotion Engine, VIF, GIF,IPU and others.

An example of one emulating SPE handling the emulation of multiple PS2system devices is that the emulation of the PS2's IOP is shared (on asingle emulating device) with the bulk of the emulation of a CD diskcontroller.

This division of the emulation of an emulated processing unit betweentwo (or more) SPEs (emulating processing units) again can reduce themessage traffic needed to provide communication between the emulationsof those emulated processing units. The PPE can vary the distribution ofemulation tasks between the SPEs (during an overall emulationoperation), so as to alter which SPE emulates a particular emulatedprocessing unit, and/or which emulated processing units are emulated bya particular SPE.

In so far as the embodiments of the invention described above areimplemented, at least in part, using software-controlled data processingapparatus, it will be appreciated that a computer program providing suchsoftware control, a storage medium by which such a computer program isstored and a transmission medium by which such a computer program istransmitted are envisaged as aspects of the present invention. It isnoted that such software may be provided on a storage medium such as anoptical disk or a hardware memory, an/or via a transmission medium suchas a network connection or the internet.

1. A data processor comprising a plurality of interconnected emulatingprocessing units arranged to emulate the operation of an emulatedprocessor having a plurality of interconnected emulated processingunits, in which: at least one emulated processing unit is emulated bycontributions from two or more emulating processing units; and at leastone emulating processing unit contributes to emulating two or moreemulated processing units.
 2. A data processor according to claim 1, inwhich the emulating processing units communicate with one another by amessage-passing communication protocol.
 3. A data processor according toclaim 2, in which the emulating processing units are arranged tosynchronise emulated operations via the message-passing communicationprotocol.
 4. A data processor according to claim 2, in which theemulating processing units are interconnected by a circular data bus. 5.A data processor according to claim 4, in which the data bus isbidirectional.
 6. A data processor according to claim 1, comprising asupervisory processor to interpret instructions relating to an emulatedprocessor and provide information to the emulating processing units toallow the interpreted instructions to be executed.
 7. A data processoraccording to claim 6, in which the information supplied to the emulatingprocessing units comprises object code, native to the emulatingprocessing units.
 8. A data processor according to claim 6, in which thesupervisory processor is operable to vary the allocation of emulationtasks to the emulating processing units, so as to change which emulatedprocessing units are emulated by an emulating processing unit.
 9. A dataprocessing method relating to a system having a plurality ofinterconnected emulating processing units arranged to emulate theoperation of an emulated processor having a plurality of interconnectedemulated processing units; the method comprising the steps of: emulatingan emulated processing unit by contributions from two or more emulatingprocessing units; and emulating two or more emulated processing units byone emulating processing unit.
 10. Computer software for carrying out amethod according to claim
 9. 11. A medium by which computer softwareaccording to claim 10 is provided.
 12. A medium according to claim 11,the medium being a storage medium.
 13. A medium according to claim 11,the medium being a transmission medium.